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1.   Introduction 

Suppose  that  a  random  sample  of  size  N  =  n  +  n1   is  drawn  from  a 

2 
normal  distribution  N(u,o*  )  .   The  subsample  consisting  of  the  first  nn 

observations  will  be  called  the  initial  sample  and  the  remaining  n   ob- 
servations is  called  the  future  sample.   Hickman  [5]  considered  the 

problem  of  obtaining  forecast  intervals  for  the  mean  X>T   and  variance 

2  — 

S„,   of  the  entire  sample  of  size  N  based  on  the  mean  X    and  variance 
N  nQ 

2 
S     of  the  initial  sample;  the  end  points  of  the  forecast  intervals  for 

n0 

—  2 

XL   and   S    depend  only  on  the  observations  of  the  initial  sample  via 

—  2 

X    and  S    .   Hahn  [2,3,4]  derived  prediction  intervals  for  the  mean 

no      no 

and  variance  of  the  second  sample  as  well  as  a  simultaneous  prediction 
interval  to  contain  each  of  the   n1   observations  of  the  second  sample.   He 
also  considered  the  problem  of  constructing  simultaneous  prediction  inter- 
vals for  the  variances  of  each  of  k  additional  random  samples  of  size 
n1   based  on  an  informative  random  sample  of  size  n_  .   In  this  paper  we 
show  that  the  results  of  Hickman  and  Hahn  for  random  samples  are  valid  even 
where  the  sample  observations  are  correlated  and  have  a  special  correlation 
structure  such  as  interclass  correlation.   As  an  example,  samples  with  inter- 
class  correlation  occur  in  the  study  of  random  effects  models  in  Analysis  of 
Variance. 


2.   Notation  And  Basic  Results 

Let   X..  ,  j  =  1,2, ...,n.  ,  i  =  0,1,2, ...,k  be   (k  +  1)   sets  of 

2, 
random  samples  of  size  n.   from  a  normal  distribution  N(u,o  )   and  let 

k 
N  =  l     n.   .   The  means  and  variances  of  the   (k  +  1)   sets  of  random 


i=0  x 
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samples  and  the  pooled  sample  of  N  observations  are 


n.  n. 

1  ~        .,  1 


X.    =  —       7  X.  .  S.    =  —       7      (X.  .    -  X.)          i  =  0,1 k 

i        n.        *•-    ij  i        n.  /1         ij           V 

1      j=l     J  x  j=l 

k     ni  k     ni 

-                lrr  2             1          r           r                         ~   2 

x    »    ~    I      I  x..  s    =    ~      I      I  (X..  -  X)Z 

Nn  Nn 

1=0  j=l     J  1=0  j=l       J 


It  is  convenient  to  use  matrix  notation  and  express  the  sample  variances 
as  quadratic  forms,  in  deriving  the  results.   Let  the  symbols  I      and  j£ 
represent  the  identity  matrix  and  a  matrix  all  of  whose  elements  are  unity 
respectively.   Also,  define  the  vector  X  by 


-     (X01'X02'*",X0n0  ,X10'  X12",,Xlni  "••,Xkl'Xk2,,,"Xknk)T 


Then, 

2 
NS      =  X'B  X 

and 

2 
n.S.  =  X'B.  X  1  =  0,1,2, ... ,k 

11 i  — 

where  B  =   I_  -  N~   ,E 

NxN     NxN 

and  the  B.  are   NxN  block  diagonal  matrices  with  the  i    diagonal 


block  equal  to 


I  -  n.1  E 

—  i   — 

n.xn.  n.xn. 

li  li 


and  the  rest  of  the  blocks  being  zero  matrices.   It  is  easily  shown  that  the 

2  2 

matrices  B     and  B.   are  idempotent  and  that  X'^  2i/°   an(*  X'B.  X/a 
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have  chi-square  distributions  with   (N-l)   and   (n.-l)   degrees  of  freedom 

(see  Rao  [10,  Chapter  3].   Further,  the  quadratic  f orm  X1 BX  can  be  partioned  as 

k  k      _    _ 

X'BX=  I     X'B.X+  I      n.  (X.  -  X) 

i=0       i=0  1       x 

It  follows  from  an  application  of  Hogg  and  Craig's  Theorem  [6,  Chap.  XIII]  that 

2 
X'_B.X/o     i  =  0,l,2,...,k   are  mutually  independent  chi-square  variates 

k     -    -  2 
and  that   £  n.  (X.  -  X)   =  X'Bl    X   also  has  a  chi-square  distribution 

i=0 

with  k  degrees  of  freedom;  as  a  consequence  the  matrix  B,  , ,   is  also 

— k+1 

indempotent. 

The  construction  of  a  simultaneous  prediction  interval  to  contain  the 
variances  of  k  sets  of  future  samples  is  based  on  a  statistic  whose  distri- 
bution is  known  as  the  studentized  largest  (smallest)  chi-square  distribution. 
Suppose  Y   is  a  chi-square  random  variable  with  v   degrees  of  freedom, 

Y-,Y„,...,Y   are   identically  distributed  as   chi  -  square 
1   z      m 

with  v1   degrees  of  freedom  and  Y.,Y  ,Y  ,...,Y   are  mutually  independent. 
1  0   1   z      m 

Then,  the  distribution  of 

min(Yn,Y  ,...,Y  ) 
IT  ±   Z      m 


Y0 

and  max(Y1,Y0,...,Y  ) 

IT  1   Z      m 

are  known  as  studentized  smallest  and  largest  chi-square  distributions  re- 
spectively.  These  distributions  depend  on  three  parameters  vn»vi   an^  m 
and  Krishnaiah  and  Armitage  [7,8]   constructed  tables  of  percentage  points 
for  the  two  distributions;  the   100y   percentiles  of  the  distributions  will 
be  denoted  by  W  (v  ,v  ,m,Y)   and  W  (v  ,v1,mJy)  . 
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The  multivariate   t   distribution  is  used  in  deriving  a  simultaneous 
prediction  interval  for  each  observation  in  a  future  sample  of  size  n, 
based  on  a  prior  sample  of  size  n   .   Let   Z^  =  (Z..,Z  ,  ...,Z  )'   be  dis- 
tributed as  multivariate  normal  with  zero  mean  vector  and  covariance 

2 
matrix  A_  ;  the  diagonal  elements  of   A_  are  all  equal  to  a        and  all 

2         2.2 
the  off-diagonal  elements  are  equal  to  pa   .   If   S  /a   is  a  chi-square 

variate  with  v   degrees  of  freedom  distributed  independently  of  _Z  ,  then 

the  joint  distribution  of   t-,t_,...,t   where   t.  =*FJ~  Z.  /S   is  known  as 
J  1   2      p  i   V    x 

the  central  p-variate   t  distribution.   Krishnaiah  and  Armitage  [  9  ] 

tabulated  the  values   t  (v,p,y)    such  that 

P 

P[t±  <  t  (v,p,Y)   »   i  =  l,2,...,p]  =  y 

for  various  choices  of  p,  v  and  p  .  The  percentage  points  t(v,y)  and 
F(v..>v  ,y)  of  the  student's  t  distribution  and  the  F  distribution  are 
also  used  in  constructing  some  of  the  prediction  intervals. 

A  Theorem  due  to  Baldessari  [  1  ]   used  in  extending  the  results  of 
Hickman  and  Hahn  is  stated  below. 


Baldessari  Theorem:   Let  X    have  a  multivariate  normal  distribution 

n*l 
N(y_,V)   where  jj  =  (u,u,...,u)'   and  V  is  a  positive  definite  matrix 

and  let  B-.,B  ,...,K   be  idempotent  matrices  satisfying 
~U  — 1     — k. 

k  - 

I  JL  -  I     -     n   E   . 
j=0     nxn       nxn 


A  necessary  and  sufficient  condition  for  X'ji.X/a   j  =  0,l,2,...k   to  be 
independent  and  have  chi-square  distributions  with  degrees  of  freedom 
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r.  =  rank  B.   is  that  the  covariance  matrix  V  have  the  form 
J        -J 


V 
n*n 


±   (A  +  A')  +  a  (I  -  E) 


(2.1) 


where 


'al    al  •*•  al 


a2    a2  ...  a2 


ka     a   ...  a 
n     n      n 


and  a  and  a.   are  positive  constants.   A  covariance  matrix  with  the 
1 

structure  defined  in  Baldessari  Theorem  occurs  in  the  study  of  the  variance 
component  model 


Y. .  =  u  +  a.  +  e. . 


j  =  1,2, ... ,n  ;  i  =  1,2, ... ,k 


2  2 

where  y   is  a  constant,   a.   are  i*i»d  N(0,a   )  ,   e.  .   are  i*i*d  N(0,cr  ) 

i  a       ij 

and  a.   and  e. .   are  mutually  independent.   In  this  case,  it  can  be  shown 

that  Y.  =  (Y. - ,Y._, . . . ,Y.  )'   has  a  multivariate  normal  distribution  with 
—i     ll   i2      in 


mean  jj=  (y,y,...,y)'   and  covariance  matrix 


a  +a 
a 


.  a 


2   2 
a  +a  . 
a 


\ 


2      2 

a  +a 
a 


To  see  that  V  has  the  same  form  as  in  Baldessari  Theorem  let 


ax  =  a2  =  • 


2    2 
=  a  =  a  +  a 
n        a 


and 


a  =  a 


6  - 


3.   Prediction  Intervals  for  Sample  Variances 

Suppose  X...  ,X-„,  .  . .  ,Xn     is  an  initial  sample  and  X. ..  ,X   , .  .  .  ,X.   , 
r       01  02      On.  r  .     xl     3.2      m. 

0  k  i 

i  =  l,2,...,k   are  k  sets  of  future  samples.   Let  N  =  £  n.     and 

i=0  x 

X=  (Xrt, ,...,X.   , . .  .X,  -,. .  .X.   )'  .   It  is  assumed  that  X  is  distributed 

01      %    ^     H  NxN 

as  an  N-variate  normal  with  mean  vector  jj_  =  (u,...,u)f   and  covariance 

matrix  V  as  in  (2.1).   The  problem  is  to  construct  a  simultaneous  prediction 

2 
interval  to  contain  each  of  the  sample  variances   S.  ,   i  =  l,2,...,k  .   As 

2 
indicated  in  Section  2,  if   S    is  the  variance  of  the  pooled  sample  then 

NS    =   7   n.S.   +   V   n. (X.  -  X) 
11         11 
i=0        i=0 

or  equivalently ,  in  matrix  notation 

k 

X'B  X  =   y   X'B.X  +  X'B-^.X 

.^0--i-   -Hc+1-  (3.D 

For  random  samples  i.e.,  for  V  =  a  I        the  variables   Q./a  =  X'B.X/a 

i  =  0,l,2,...,k  are  distributed  as  chi-square  and  Q./a  ,  i  =  0,l,2,...,k 

are  mutually  independent;  by  Hogg  and  Craig's  Theorem  [  6  ]   Q1.4.1 1 °-  =   2L'^.4.i^/a 

is  chi-square  distributed  implying  that   B,  ,-,   Is  idempotent. 

"^  -1  k+1 

Since  in  equation  (3.1)  the  matrices   B^  and  B.   satisfy  (i)   B  =   £  B. 

"~ L  1=0  ~L 

and  (ii)  the  matrices   B.   are  idempotent  by  Baldessari  Theorem  Q./a 

i  =  0,1,2, ... ,k+l   are  mutually  independent  chi-square  variates  even  for 

correlated  samples  with  covariance  matrix  V_  as  in  (2.1). 

2 
The  prediction  intervals  for  S.  ,  i  =  1,2, ...,k  are  obtained  as  follows. 

If  k  =  1,  the  variates   Qn/a  »  Q-i/a  >  Qn/a  are  independently  distributed  as 

chi-square  with  n^-1  >  n-i~x     anc^  1  degrees  of  freedom  respectively.   The 
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ratio   (n_  -  1)Q  /(n1  -  l)Qn  has  an  F-distribution.   For  a  specified 


0 


■r  v  1 


'•o 


confidence  coefficient   1  -  y 

(n  -1)( 


^(nrl,n0-l,  \)  <     -^— j  <  F(nrl,n0-1.  1  -  \  )' 


=  1  -  Y 


and  thus 

nn(n  -1) 


L^T)     so    '(V^V1-   2  >  <  si   <  n^S    so   ^V1"  V1'1  "  2  } 


=  1- 


A  prediction  interval  for  S  the  variance  of  the  pooled  sample  is  obtained 
by  noting  that  (n  -1)  (Q..  +  Q  )/n  Q  has  an  F  distribution  with  n  and 
n„  -  1  degrees  of  freedom.   A  100(l-y)%  prediction  interval  for 

SN  =  (Q0  +  Ql  +  Q2)/N  is 


Ln7I  F^V1'  2>  +  X 


y0    c2 
— -  <  S    < 

N      N 


r5l  F(nl'n0-^  !  -  1  >  +  ! 


1  Q, 


=   l-Y 


2        2 
The  two  prediction  intervals  for   S.   and  S„  are  exactly  the  same  as  the 

1        N 

ones  obtained  by  Hahn  and  Hickman  for  random  samples. 

2 
For  k  >  1  ,  a  simultaneous  prediction  interval  for   S.  ,  i  =  1,2, ...,k 

is  derived  by  assuming  that  n.  =  n  ,  i  =  l,2,...,k  .   The  variate 

e  Ql   Q2     Qk  s 
max(~  '  —  '••"  ~) 


w. 


•o 


has  a  studentized  largest  chi-square  distribution  with  parameters  n  -1  , 
n-1   and  k  .   Hence, 


Qi^Vv1  '  n_1  'k,1"Y)Qo  *  i  =  1>2»---»k 


=   l-Y 


and 


[  si    i   ^   VV1  •  -1  -k-  l~<)sl 


>    i-  ::   l,^,..«,k- 


=   1   -  Y 
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This  simultaneous  interval  is  an  extension  of  Hahn's  results.   Lower 
bounds  and  two  sided  bounds  may  be  obtained  using  the  studentized 
smallest  chi-square  variate  W.  . 

4.   Prediction  Intervals  For  The  Observations  In  A  Future  Sample. 

Let  X,  ,X_,...,X  ,X  ,..  ,X  ,_,...  ,X  ,    be  a  sample  of  size  N  =  n+m 
12      n  n+1  n+2      n+m 

such  that  X  =  (X..  ,X„,  . .  .  ,X  ,X  ,.,,X  ._,..., X,,)'   has  a  multivariate  normal 
—     12      n  n+1   n+2      N 

distribution  with  mean   y  =  (y,...,y)'   and  covariance  matrix  _V  as  in 

—   2—2         —    2 
(2.1).   Let   (X  ,S   ),   (X  ,  S   )   and   (X  ,  S   )   denote  the  mean  and 

variance  of  the  first   n  observations,  the  second  set  of  m  observations 

and  the  pooled  sample  of  N  observations  respectively. 

For  m  =  1  ,  that  is,   N  =  n+1 


(n+1)Sn+l  =  "Sl  +^(Xn+l-Xl>2 


or  as   quadratic   forms 


X'B   X     =      X'jJrX     +     X'  B^X 


By  the  use  of  Baldessari  theorem  it  can  be  concluded  that 

2 


X1  B  X 


n-1    n+1  n 
n+1      2 


is  distributed  as   F  with   1   and  n-1   degrees  of  freedom.   Thus, 


X  - 
n 


(n+DS*  F(l,n-1,Y)  ' 
— - 


<   X  ,  .   <   X  + 
n+1      n 


2  1/2 

(n+l)SZ  F(l,n-1,Y)  " 
n 

n-1 


=  1  -  Y 
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If  m  >  1  a  prediction  interval  for  X_   the  mean  of  the  second  sample 

can  be  obtained  following  the  same  procedure  as  above,  that  is,  starting 

2 
with  a  partition  of  the  sum  of  squares   NS   .   The  resulting  interval 

would  be  exactly  the  same  as  the  one  obtained  by  Hickman  for  random  samples 

To  construct  a  simultaneous  prediction  interval  for  X  ,.,  X  ,„,..., X.. 

n+1   n+2      n+m 

let   Z.  =  X  ,  .  -  X     i  =  1,2,.  .  .,m  .   Then,   Z  =  (Z. ,Z_, . . . ,Z  )'   can 
l    n+i    n  —12m. 

be  expressed  as   Z     =  C*      X    where  the  matrix  C_'   is  defined  by 
mxl   m^N  N*l 


N   \     mxn  '.   mxm  / 


m><N 


It  follows  that  _Z  is  distributed  as  multivariate  normal  with  zero  mean 
vector  and  covariance  matrix  (see  appendix  for  computations) 

C'V  C  =  a(  -  E  +  I  )  • 

n  —   — 

mxm 

Also,      Z  =  C'X       and     =     which   is    chi-square   distributed 

— a        a 

are  statistically  independent  since   C,'V  _B  =  j3   (see  appendix  for  compu- 
tations) which  is  a  sufficient  condition  for  independence  (see  Rao  [10,  Chap.  3]) 

Let  z-  x  +i  -  * 

TT   _  l  n+i n  _ 

w z -i—pr     - —pr  i  =  1,2,.  .  .  ,m  .   Then, 

[a(l  +j;  )]UZ  [o(l  +j;   )]i/Z 

n  n 

W  ,W  ,  .  .  .  ,W   are  jointly  distributed  as  an  m-variate  normal  with  zero 
1  z  m 

mean  vector  and  covariance  matrix 


n+1 


n+1 
1 


n+1   n+1 


-  10  - 


Define  the  variates 


W. 


X  ,  .-X 

n+i  n 


nS' 


(n-l)a 


V 


n-1 
n+1 


i  =  1, 2, . . . ,m 


The   joint   distribution   of 


t0  ,  .  .  . ,  t     is  the  central 
I  m 


m-variate   t   distribution  with  parameters   (n-1)   and  p  =  — —     .   If 


n+1 


_  1 


t  (n-1,  — -  ,1  — jz   )   is  the  upper   (1  -  —  )th  percentage  point  of  the 
m-variate   t   distribution  then 


X  -  t  (n-1,  -,  .  ,  1  -  *  )\/^f  S 
n    m      n+1       2   V  n+1 


<  X   .   < 
n+i 


n    m 


1 

n+1 


_  1  \\pEL 

2   ;Vn+l 


X  +  t  (n-1,  -  .,  ,  1  -  i:  )\l~f   S   for  1-1,2,... in 


=  1  -  Y 


This  result  is  identical  to  the  prediction  interval  of  Hahn  for  random 
samples. 
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APPENDIX 

Suppose  X  =  (X..  ,X_,  .  .  .  ,X  ,X  ,,,X X,  )'    is  a  sample  of 

—     1   z      n  n+1   n+Z      n+m 

size  N  =  n+m  that  is  jointly  distributed  as  multivariate  normal  with 
mean  vector  y  =  (y,...,y)'   and  covariance  matrix 


V  =  j   (A  +  A'  )  +  a  (I 

NxN 


-  E) 


where 


A 

NxN 


ai     ai 


a2    a2 


aN    aN 


*N 


and 


a  and  a.   are  positive  constants.   Let   Z  =  (Z1,Z_,...,Z  )'   where 
1  —12m 


Z.  =  X  , .  -  X   ,  i  =  l,2,...,m  .   Then,  if 
i    n+i    n 


Z  =  C'X 


The  joint  distribution  of   Zn,Z  ,...,Z   is  multivariate  normal  with  mean 

i   z      m 

1       E    ]■ 
C'y  =  0   and  covariance  matrix  C'VC=a("   —   —  )   as  shown  below: 
—     —  nmmmm 

Partition  the  matrix  A  as 


A  = 


/  h 

'2  \ 

1       nxn 

nxm       \ 

\  * 

J 

x      mxn 

mxm 
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Note  that  in  each  of  the  matrices  A-i>A9>Ao>— A   tne  columns  are  identical 
and  all  elements  in  each  row  are  the  same. 


£1Z=    -  I!  .  I 


i  e  : 

n 


Hi 


+ 


+    a 


lxN 


n   n+2   i7N 


n+m    ixN 


+  a( E 

n  — 
mxn 


I   ) 
mxm 


where 


Hence, 


-  I 


i=l 


C'V  C  = 


C--+  a  ,.)E 
n    n+1  — 

(-  A  +  a   )E 

n    n+2  — 


n    n+m  — 


+  a( E  :  I) 

n   .  — 


1        E 
n  nxm 

I 
mxm 


=  a( 


1   E  +  I     ) 

n  mxm   mxm 
n 


9       I   "  9 

Further  if  S   =  -  V  (X.  -  X  )    ,   then 

n     n  .L.  i    n 
i=l 
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nS   =  X'ji.i_X.     where 


% 


T      _1 

/  I  -  n 
/  nxn 

E 
nxn 

nxm   \ 

V   °- 

\    mxn 

mxm  / 

The  quadratic  form  X'_B.._X   and  the  linear  form  Z     =  _C'X   are  independent 


since 


£'1  li 


("  n  +  an+l  >  £ 


(-  -  +  a    )  E 
n  .   n+2   — 


n    n+m   — 


+  a(-  -  E  !  I  ) 
n  —  .  — 


> 


v 


(     I  -  n_1E 
nxn 

s\ 

v      ft 

0 

(-  -  +  a    )   E 
n     n+1   — 
lxn 


(-  -  +  a    )   E 
n     n+2   — 
lxn 


a_       _E 
n  mxn 


(-  J  +  a   )   E 

n     n+m   — 
lxn 


(-  -  +  a  ,.)n  E 
n     n+1    ,— 
lxn 


(-  -  +  a   )n  E 

n   .   n+2    — 
lxn 

n     n+m   ,— 
lxn 


E 
n  — 
mxn 


-  15  - 


DISTRIBUTION  LIST 


Defense  Documentation  Center  (DDC) 
Cameron  Station 
Alexandria,  Virginia  22314 


Library 

Naval  Postgraduate  School 

Monterey,  California  93940 


Department  of  Mathematics 
Naval  Postgraduate  School 
Monterey,  California  93940 


Professor  Toke  Jayachandran  25 

Code  53Jy 

Naval  Postgraduate  School 

Monterey,  California  93940 


Professor  Glenn  A.  Stoops 
Code  53Zp 

Naval  Postgraduate  School 
Monterey,  California   93940 


Dean  of  Research  Administration 
Code  023 

Naval  Postgraduate  School 
Monterey,  California   93940 


Dean  W.  Max  Woods 

Code  500/024 

Naval  Postgraduate  School 

Monterey,  California   93940 


-  16  - 


DUDLEY  KNOX  LIBRARY  -  RESEARCH  REPORTS 


5  6853  01057830  5 


(ft^y 


